# PDF files

You can use this version of the popular PDFLoader in web environments.
By default, one document will be created for each page in the PDF file, you can change this behavior by setting the `splitPages` option to `false`.

## Setup

```bash npm2yarn
npm install pdf-parse
```

## Usage

import CodeBlock from "@theme/CodeBlock";
import Example from "@examples/document_loaders/web_pdf.ts";

<CodeBlock language="typescript">{Example}</CodeBlock>

## Usage, custom `pdfjs` build

By default we use the `pdfjs` build bundled with `pdf-parse`, which is compatible with most environments, including Node.js and modern browsers. If you want to use a more recent version of `pdfjs-dist` or if you want to use a custom build of `pdfjs-dist`, you can do so by providing a custom `pdfjs` function that returns a promise that resolves to the `PDFJS` object.

In the following example we use the "legacy" (see [pdfjs docs](https://github.com/mozilla/pdf.js/wiki/Frequently-Asked-Questions#which-browsersenvironments-are-supported)) build of `pdfjs-dist`, which includes several polyfills not included in the default build.

```bash npm2yarn
npm install pdfjs-dist
```

```typescript
import { WebPDFLoader } from "langchain/document_loaders/web/pdf";

const blob = new Blob(); // e.g. from a file input

const loader = new WebPDFLoader(blob, {
  // you may need to add `.then(m => m.default)` to the end of the import
  pdfjs: () => import("pdfjs-dist/legacy/build/pdf.js"),
});
```

## Eliminating extra spaces

PDFs come in many varieties, which makes reading them a challenge. The loader parses individual text elements and joins them together with a space by default, but
if you are seeing excessive spaces, this may not be the desired behavior. In that case, you can override the separator with an empty string like this:

```typescript
import { WebPDFLoader } from "langchain/document_loaders/web/pdf";

const blob = new Blob(); // e.g. from a file input

const loader = new WebPDFLoader(blob, {
  parsedItemSeparator: "",
});
```
